Search results for " data structure."

showing 10 items of 88 documents

Binary jumbled string matching for highly run-length compressible texts

2012

The Binary Jumbled String Matching problem is defined as: Given a string $s$ over $\{a,b\}$ of length $n$ and a query $(x,y)$, with $x,y$ non-negative integers, decide whether $s$ has a substring $t$ with exactly $x$ $a$'s and $y$ $b$'s. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time $O(n^2/\log n)$ [Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or $O(n^2/\log^2 n)$ in the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index constructed directly from the run-length encoding of $s$. The construction time of our index i…

FOS: Computer and information sciencesString algorithmsStructure (category theory)Binary numberG.2.1Data_CODINGANDINFORMATIONTHEORY0102 computer and information sciences02 engineering and technologyString searching algorithm01 natural sciencesComputer Science - Information RetrievalTheoretical Computer ScienceCombinatoricsdata structuresSimple (abstract algebra)Computer Science - Data Structures and AlgorithmsString algorithms; jumbled pattern matching; prefix normal form; data structures0202 electrical engineering electronic engineering information engineeringParikh vectorData Structures and Algorithms (cs.DS)Run-length encodingMathematics68W32 68P05 68P20String (computer science)prefix normal formSubstringComputer Science Applicationsjumbled pattern matching010201 computation theory & mathematicsData structureSignal ProcessingRun-length encoding020201 artificial intelligence & image processingConstant (mathematics)Information Retrieval (cs.IR)Information SystemsInformation Processing Letters

researchProduct

Alignment-free sequence comparison using absent words

2018

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…

0301 basic medicineFOS: Computer and information sciencesFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheorySequence alignmentInformation System0102 computer and information sciencesCircular wordAbsent words01 natural sciencesUpper and lower boundsSequence comparisonTheoretical Computer ScienceCombinatorics03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Absent wordCircular wordsMathematicsSequenceSettore INF/01 - InformaticaProcess (computing)q-gramComputer Science Applications1707 Computer Vision and Pattern Recognitionq-gramsComposition (combinatorics)Computer Science Applications030104 developmental biologyComputational Theory and MathematicsForbidden words010201 computation theory & mathematicsFocus (optics)Forbidden wordWord (computer architecture)Information SystemsInteger (computer science)

researchProduct

Quantum algorithm for tree size estimation, with applications to backtracking and 2-player games

2017

We study quantum algorithms on search trees of unknown structure, in a model where the tree can be discovered by local exploration. That is, we are given the root of the tree and access to a black box which, given a vertex $v$, outputs the children of $v$. We construct a quantum algorithm which, given such access to a search tree of depth at most $n$, estimates the size of the tree $T$ within a factor of $1\pm \delta$ in $\tilde{O}(\sqrt{nT})$ steps. More generally, the same algorithm can be used to estimate size of directed acyclic graphs (DAGs) in a similar model. We then show two applications of this result: a) We show how to transform a classical backtracking search algorithm which exam…

FOS: Computer and information sciencesQuantum PhysicsSpeedupBacktrackingFOS: Physical sciences0102 computer and information sciences02 engineering and technologyComputational Complexity (cs.CC)Directed acyclic graph01 natural sciencesSearch treeCombinatoricsComputer Science - Computational Complexity010201 computation theory & mathematicsSearch algorithm020204 information systemsComputer Science - Data Structures and AlgorithmsTernary search tree0202 electrical engineering electronic engineering information engineeringData Structures and Algorithms (cs.DS)Quantum algorithmDepth-first searchQuantum Physics (quant-ph)MathematicsProceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing

researchProduct

Computing the original eBWT faster, simpler, and with less memory

2021

Mantaci et al. [TCS 2007] defined the eBWT to extend the definition of the BWT to a collection of strings, however, since this introduction, it has been used more generally to describe any BWT of a collection of strings and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algorithm for the construction of the original eBWT, which does not require the preprocessing of Bannai et al. [CPM 2021]. As a byproduct, we obtain the first linear-time algorithm for computing the BWT of a single string that uses neither an end-of-string symbol nor Lyndon rotations. We combine our ne…

FOS: Computer and information sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)

researchProduct

The rightmost equal-cost position problem.

2013

LZ77-based compression schemes compress the input text by replacing factors in the text with an encoded reference to a previous occurrence formed by the couple (length, offset). For a given factor, the smallest is the offset, the smallest is the resulting compression ratio. This is optimally achieved by using the rightmost occurrence of a factor in the previous text. Given a cost function, for instance the minimum number of bits used to represent an integer, we define the Rightmost Equal-Cost Position (REP) problem as the problem of finding one of the occurrences of a factor whose cost is equal to the cost of the rightmost one. We present the Multi-Layer Suffix Tree data structure that, for…

FOS: Computer and information sciencesOffset (computer science)Computer scienceSuffix treeComputer Science - Information Theorylaw.inventionCombinatoricslawLog-log plotComputer Science - Data Structures and AlgorithmsCompression schemetext compressiondictionary text compressionData Structures and Algorithms (cs.DS)LZ77 compressiondata compressionLossless compressionfull text indexSuffix Tree Data StructuresSettore INF/01 - InformaticaInformation Theory (cs.IT)Data structurePrefixCompression ratioCompression scheme; Constant time; Suffix Tree Data StructuresAlgorithmData compressionConstant time

researchProduct

On the Suitability of Neural Networks as Building Blocks for the Design of Efficient Learned Indexes

2022

With the aim of obtaining time/space improvements in classic Data Structures, an emerging trend is to combine Machine Learning techniques with the ones proper of Data Structures. This new area goes under the name of Learned Data Structures. The motivation for its study is a perceived change of paradigm in Computer Architectures that would favour the use of Graphics Processing Units and Tensor Processing Units over conventional Central Processing Units. In turn, that would favour the use of Neural Networks as building blocks of Classic Data Structures. Indeed, Learned Bloom Filters, which are one of the main pillars of Learned Data Structures, make extensive use of Neural Networks to improve…

Machine LearningLearned Data StructuresNeural NetworksSettore INF/01 - Informatica

researchProduct

A Fast Algorithm Finding the Shortest Reset Words

2012

In this paper we present a new fast algorithm finding minimal reset words for finite synchronizing automata. The problem is know to be computationally hard, and our algorithm is exponential. Yet, it is faster than the algorithms used so far and it works well in practice. The main idea is to use a bidirectional BFS and radix (Patricia) tries to store and compare resulted subsets. We give both theoretical and practical arguments showing that the branching factor is reduced efficiently. As a practical test we perform an experimental study of the length of the shortest reset word for random automata with $n$ states and 2 input letters. We follow Skvorsov and Tipikin, who have performed such a s…

FOS: Computer and information sciencesFormal Languages and Automata Theory (cs.FL)Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Computer Science - Formal Languages and Automata TheoryComputer Science::Formal Languages and Automata Theory

researchProduct

Indexed Two-Dimensional String Matching

2016

Settore INF/01 - InformaticaTwo-dimensional index data structuresString searching algorithm0102 computer and information sciences02 engineering and technologyApproximate string matching01 natural sciencesCombinatorics010201 computation theory & mathematicsIndex data structures for matrices or imageIndexing for matrices or image0202 electrical engineering electronic engineering information engineeringTwo-dimensional indexing for pattern matching020201 artificial intelligence & image processingString metricMathematics

researchProduct

Quadratic speedup for finding marked vertices by quantum walks

2020

A quantum walk algorithm can detect the presence of a marked vertex on a graph quadratically faster than the corresponding random walk algorithm (Szegedy, FOCS 2004). However, quantum algorithms that actually find a marked element quadratically faster than a classical random walk were only known for the special case when the marked set consists of just a single vertex, or in the case of some specific graphs. We present a new quantum algorithm for finding a marked vertex in any graph, with any set of marked vertices, that is (up to a log factor) quadratically faster than the corresponding classical random walk.

FOS: Computer and information sciencesQuadratic growthQuantum PhysicsQuantum algorithmsSpeedupMarkov chainMarkov chainsProbability (math.PR)FOS: Physical sciencesRandom walkVertex (geometry)CombinatoricsQuadratic equationSearch by random walkQuantum searchComputer Science - Data Structures and AlgorithmsFOS: MathematicsData Structures and Algorithms (cs.DS)Quantum walkQuantum algorithmQuantum Physics (quant-ph)Mathematics - ProbabilityMathematicsQuantum walks

researchProduct

On Combinatorial Generation of Prefix Normal Words

2014

A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present an efficient algorithm for exhaustively listing the prefix normal words with a fixed length. The algorithm is based on the fact that the language of prefix normal words is a bubble language, a class of binary languages with the property that, for any word w in the language, exchanging the first occurrence of 01 by 10 in w results in another word in the language. We prove that each prefix normal word is produced in O(n) amortized time, and conjecture, based on expe…

FOS: Computer and information sciencesDiscrete Mathematics (cs.DM)Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Data_CODINGANDINFORMATIONTHEORYComputer Science - Discrete Mathematics

researchProduct